Goto

Collaborating Authors

 sequence element




The $\alpha$-Alternator: Dynamic Adaptation To Varying Noise Levels In Sequences Using The Vendi Score For Improved Robustness and Performance

arXiv.org Machine Learning

Current state-of-the-art dynamical models, such as Mamba, assume the same level of noisiness for all elements of a given sequence, which limits their performance on noisy temporal data. In this paper, we introduce the $\alpha$-Alternator, a novel generative model for time-dependent data that dynamically adapts to the complexity introduced by varying noise levels in sequences. The $\alpha$-Alternator leverages the Vendi Score (VS), a flexible similarity-based diversity metric, to adjust, at each time step $t$, the influence of the sequence element at time $t$ and the latent representation of the dynamics up to that time step on the predicted future dynamics. This influence is captured by a parameter that is learned and shared across all sequences in a given dataset. The sign of this parameter determines the direction of influence. A negative value indicates a noisy dataset, where a sequence element that increases the VS is considered noisy, and the model relies more on the latent history when processing that element. Conversely, when the parameter is positive, a sequence element that increases the VS is considered informative, and the $\alpha$-Alternator relies more on this new input than on the latent history when updating its predicted latent dynamics. The $\alpha$-Alternator is trained using a combination of observation masking and Alternator loss minimization. Masking simulates varying noise levels in sequences, enabling the model to be more robust to these fluctuations and improving its performance in trajectory prediction, imputation, and forecasting. Our experimental results demonstrate that the $\alpha$-Alternator outperforms both Alternators and state-of-the-art state-space models across neural decoding and time-series forecasting benchmarks.


Associative Knowledge Graphs for Efficient Sequence Storage and Retrieval

arXiv.org Artificial Intelligence

This paper presents a novel approach for constructing associative knowledge graphs that are highly effective for storing and recognizing sequences. The graph is created by representing overlapping sequences of objects, as tightly connected clusters within the larger graph. Individual objects (represented as nodes) can be a part of multiple sequences or appear repeatedly within a single sequence. To retrieve sequences, we leverage context, providing a subset of objects that triggers an association with the complete sequence. The system's memory capacity is determined by the size of the graph and the density of its connections. We have theoretically derived the relationships between the critical density of the graph and the memory capacity for storing sequences. The critical density is the point beyond which error-free sequence reconstruction becomes impossible. Furthermore, we have developed an efficient algorithm for ordering elements within a sequence. Through extensive experiments with various types of sequences, we have confirmed the validity of these relationships. This approach has potential applications in diverse fields, such as anomaly detection in financial transactions or predicting user behavior based on past actions.


StemGen: A music generation model that listens

arXiv.org Artificial Intelligence

End-to-end generation of musical audio using deep learning techniques has seen an explosion of activity recently. However, most models concentrate on generating fully mixed music in response to abstract conditioning information. In this work, we present an alternative paradigm for producing music generation models that can listen and respond to musical context. We describe how such a model can be constructed using a non-autoregressive, transformer-based model architecture and present a number of novel architectural and sampling improvements. We train the described architecture on both an open-source and a proprietary dataset. We evaluate the produced models using standard quality metrics and a new approach based on music information retrieval descriptors. The resulting model reaches the audio quality of state-of-the-art text-conditioned models, as well as exhibiting strong musical coherence with its context.


Markov Networks for Detecting Overalpping Elements in Sequence Data

Neural Information Processing Systems

Many sequential prediction tasks involve locating instances of pat- terns in sequences. Generative probabilistic language models, such as hidden Markov models (HMMs), have been successfully applied to many of these tasks. A limitation of these models however, is that they cannot naturally handle cases in which pattern instances overlap in arbitrary ways. We present an alternative approach, based on conditional Markov networks, that can naturally repre- sent arbitrarily overlapping elements. We show how to efficiently train and perform inference with these models.


Learning to Drop Out: An Adversarial Approach to Training Sequence VAEs

arXiv.org Artificial Intelligence

In principle, applying variational autoencoders (VAEs) to sequential data offers a method for controlled sequence generation, manipulation, and structured representation learning. However, training sequence VAEs is challenging: autoregressive decoders can often explain the data without utilizing the latent space, known as posterior collapse. To mitigate this, state-of-the-art models'weaken' the'powerful' decoder by applying uniformly random dropout to the decoder input. We show theoretically that this removes pointwise mutual information provided by the decoder input, which is compensated for by utilizing the latent space. We then propose an adversarial training strategy to achieve information-based stochastic dropout. Compared to uniform dropout on standard text benchmark datasets, our targeted approach increases both sequence modeling performance and the information captured in the latent space.


Shift-Equivariant Similarity-Preserving Hypervector Representations of Sequences

arXiv.org Artificial Intelligence

Hyperdimensional Computing (HDC), also known as Vector-Symbolic Architectures (VSA), is a promising framework for the development of cognitive architectures and artificial intelligence systems, as well as for technical applications and emerging neuromorphic and nanoscale hardware. HDC/VSA operate with hypervectors, i.e., distributed vector representations of large fixed dimension (usually > 1000). One of the key ingredients of HDC/VSA are the methods for encoding data of various types (from numeric scalars and vectors to graphs) into hypervectors. In this paper, we propose an approach for the formation of hypervectors of sequences that provides both an equivariance with respect to the shift of sequences and preserves the similarity of sequences with identical elements at nearby positions. Our methods represent the sequence elements by compositional hypervectors and exploit permutations of hypervectors for representing the order of sequence elements. We experimentally explored the proposed representations using a diverse set of tasks with data in the form of symbolic strings. Although our approach is feature-free as it forms the hypervector of a sequence from the hypervectors of its symbols at their positions, it demonstrated the performance on a par with the methods that apply various features, such as subsequences. The proposed techniques were designed for the HDC/VSA model known as Sparse Binary Distributed Representations. However, they can be adapted to hypervectors in formats of other HDC/VSA models, as well as for representing sequences of types other than symbolic strings.


Recurrent Neural Networks (RNNs): A gentle Introduction and Overview

arXiv.org Machine Learning

State-of-the-art solutions in the areas of "Language Modelling & Generating Text", "Speech Recognition", "Generating Image Descriptions" or "Video Tagging" have been using Recurrent Neural Networks as the foundation for their approaches. Understanding the underlying concepts is therefore of tremendous importance if we want to keep up with recent or upcoming publications in those areas. In this work we give a short overview over some of the most important concepts in the realm of Recurrent Neural Networks which enables readers to easily understand the fundamentals such as but not limited to "Backpropagation through Time" or "Long Short-Term Memory Units" as well as some of the more recent advances like the "Attention Mechanism" or "Pointer Networks". We also give recommendations for further reading regarding more complex topics where it is necessary.


Agglomerative Attention

arXiv.org Machine Learning

To enable efficient modeling of longer Neural networks using transformer-based architectures sequences with greater correlation lengths, recent research have recently demonstrated great power and has focused on finding more scalable attention flexibility in modeling sequences of many types. One mechanisms [4, 5, 12-14]. of the core components of transformer networks is In this work we present an attention mechanism the attention layer, which allows contextual information that is linear in time and memory requirements. This to be exchanged among sequence elements. "agglomerative attention"--loosely inspired by ideas While many of the prevalent network structures thus from protein folding--works by defining a fixed number far have utilized full attention--which operates on all of classes. Target sequence elements assigned to pairs of sequence elements--the quadratic scaling of each class receive a summary representation of all reference this attention mechanism significantly constrains the elements belonging to that class. We measure size of models that can be trained. In this work, we the impact of the agglomerative attention algorithm present an attention model that has only linear requirements by replacing the full dot product self-attention layers in memory and computation time. We of universal transformers [2] with agglomerative show that, despite the simpler attention model, networks attention and measure model performance on both using this attention mechanism can attain comparable character-and word-level language modeling tasks.